Skip to content

[NAS Backup] Suppress Errors in Disk Usage Calculation that Caused Backup to Fail.#13424

Open
daviftorres wants to merge 15 commits into
apache:mainfrom
daviftorres:nas-backup-failed
Open

[NAS Backup] Suppress Errors in Disk Usage Calculation that Caused Backup to Fail.#13424
daviftorres wants to merge 15 commits into
apache:mainfrom
daviftorres:nas-backup-failed

Conversation

@daviftorres

@daviftorres daviftorres commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Description

This PR tried to prevent the failure of the job at the statistics section of a backup that has actually succeeded.

image

Apparently, it also fixes some silent failures I previously reported in #11727

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@daviftorres

daviftorres commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

This is the equivalent command for applying the fix:

sed -i 's_du -sb $dest | cut -f1_du -sb $dest 2>/dev/null | cut -f1 || true_g' /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/nasbackup.sh

We haven't confirmed the exact root cause of the du failure yet. As a precaution, we applied this fix to all servers and will monitor backups over the next few days.

So, I am running tests with 2>>/var/log/cloudstack/agent/nasbackup.err so I can see what is the error message.

@daviftorres daviftorres marked this pull request as ready for review June 16, 2026 15:02
Add timeout for unmounting backup mount point and cleanup.
@daviftorres

daviftorres commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Proposed Changes Rationale

backup_size=$(du -sb "$dest" 2>/dev/null | cut -f1) || true
  • NFS issues may cause du command to fail.
  • A size retrieval failure should not invalidate a successful backup.
timeout 60 umount "$mount_point" 2>/dev/null || true
rmdir "$mount_point" 2>/dev/null || true
  • Another process may keep the device busy (e.g., parallel backups).
  • Network issues may cause hangs on NFS.
  • Cleanup failures should not invalidate a successful backup.
echo -n "$backup_size"
  • Outputs the size at the end to confirm the script completed past the potentially problematic commands.

@daviftorres

Copy link
Copy Markdown
Contributor Author

Dear @abh1sar , do you think you can help me with this bug? Regards,

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 18.89%. Comparing base (957bfbb) to head (edc80b1).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main   #13424   +/-   ##
=========================================
  Coverage     18.88%   18.89%           
- Complexity    18223    18226    +3     
=========================================
  Files          6174     6174           
  Lines        555226   555226           
  Branches      67774    67774           
=========================================
+ Hits         104872   104895   +23     
+ Misses       438834   438810   -24     
- Partials      11520    11521    +1     
Flag Coverage Δ
uitests 3.53% <ø> (ø)
unittests 20.09% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI review requested due to automatic review settings June 18, 2026 20:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the KVM NAS backup script’s “statistics/cleanup” section so that failures while computing backup disk usage (and related cleanup commands) don’t cause an otherwise successful backup job to be marked as failed.

Changes:

  • Capture du output into backup_size and suppress du stderr to avoid failing the script during size calculation.
  • Add timeout around umount and suppress errors from umount/rmdir.
  • Emit the computed backup size at the end of backup_running_vm().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +199 to +203
backup_size=$(du -sb "$dest" 2>/dev/null | cut -f1) || true

timeout 60 umount "$mount_point" 2>/dev/null || true
rmdir "$mount_point" 2>/dev/null || true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants